Uber API Design Evaluation and Latency Budget
Learn how we meet the non-functional requirements and estimate the response time of the Uber APIs.
Introduction#
We've discussed the design considerations and API model for the functional requirements of our Uber service. In this lesson, we'll cover different approaches to achieve our non-functional requirements and estimate the response time of our Uber services.
Non-functional requirements#
The non-functional requirements for our APIs are availability, low latency, scalability, and security. Let's understand how we can achieve our requirements.
Availability#
We ensure the availability of the services by decoupling the services. For example, the driver service continuously works and keeps the up-to-date location of drivers if the rider service goes down temporarily. The availability of our services also depends on the supporting services. For example, if Google Maps is unavailable, we have other alternate maps services (such as MapQuest and Waze), although these services may not support all the features that Google Maps offers. Our service supports multiple payment methods. If one payment service is down, then we have other payment methods, such as Uber balance or simply paying in cash. Moreover, we prioritize requests such as ongoing trips, payments, etc., and use rate limiting at the API gateway for other requests to prevent the services from being overloaded by the user's requests. If the service is down or overloaded by the requests, multiple replicas of the services are utilized to ensure availability.
Note: Uber may also facilitate integration with local payment gateways depending on the region of the service. Therefore, the Uber service will not go down because of the supporting service of payment gateways.
Scalability#
Since most of the communication in our design happens through the pub-sub service, we created multiple replicas of the service to avoid SPOF. We can use different instances of pub-sub service for an unrelated combination of riders/drivers to distribute the load. This also allows us to decouple services and enhance the scalability of the API. The stateless nature of the request for the services allows us to replicate the services to forward the requests to any available server. However, the scalability of our services also depends on the scalability of the supporting services. For example, if any supporting service has scaling issues, it could become a bottleneck for our service. Services like Google Maps are highly scalable and use CDNs to reduce the risk of disruption by serving static data specific to a customer's region, so there’s little chance that such supporting services will affect our service.
Point to Ponder
Question
Is it possible to load balance client requests while maintaining client sessions?
Yes, it is possible to load balance incoming client requests while maintaining the client’s sessions on the server side. We can use the following most common techniques to achieve this purpose:
-
Sticky sessions: In sticky sessions, the load balancer (API gateway) uses the URL, session ID, and other information to identify the original server and redirect the request to the same server.
-
Shared sessions: In the shared session approach, the session is stored in some common storage (memory, cache, etc.) to allow the server to seamlessly handle requests from different clients with different session IDs.
Note: While a session is being maintained, the response time of a request will most likely increase compared to a normal request (a request without a session.)
Security#
Our API allows only authenticated users to use the resources. Users can log in to Uber using a legitimate email address or phone number and use the verification code to verify the entered credentials. Moreover, we use encrypted tokens (issued only if the person is in the trusted contact list of the rider) while sharing the trip details with others to ensure the details are shared with the desired person. We don’t allow direct interaction between support services and client devices because clients can manipulate trip details (inaccurate distances, wait times, and fair estimates) by directly requesting the supporting services.
We allow the third-party login only through OAuth 2.0 and OIDC using an authorization code and proof key for code exchange (PKCE) flow to obtain a third-party access token. Access tokens mitigate the risk of data leakage while logging in with third-party applications. Additionally, we transport all data in encrypted form using a secure transport layer protocol such as TLS 1.3 so that attackers can’t access the user's personal information.
Low latency#
The driver service frequently receives updates about the driver's location and stores this information in memory to instantly find nearby drivers. We also pre-estimate the fare for regularly visited places of clients who use Uber daily and cache them to quickly respond to the ride request. The persistent connection used between the driver and driver services helps us reduce network latency. Furthermore, we use pagination for the trip history to get a paginated response.
Note: The pre-estimated fare is only an estimate that appears when choosing a different vehicle type, while the actual fare is calculated when the ride request is sent. If there is a significant difference, Uber will prompt the rider to accept the fare before notifying the drivers.
Achieving Non-Functional Requirements
Non-Functional Requirements | Approaches |
Availability |
|
Scalability |
|
Security |
|
Low latency |
|
Latency budget#
In this section, we'll estimate the response time of Uber APIs. Various APIs are coordinating under the hood of the whole Uber system. Let's divide this section by the type of request (GET and POST) sent and estimate the response time to achieve functionality. Let's start by estimating request and response sizes and calculating response times at the end of the lesson.
As discussed in the back-of-the-envelope latency calculations, the latency of the
GETandPOSTrequests are affected by two different parameters. In the case ofGET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies byper KB. Similarly, for POSTrequests, the RTT time changes with the data size byper KB after the base RTT time, which was 260 ms.
Request and response size#
We’ll estimate the response time for the POST method to book a ride and GET method to get ride history. The size of each request is estimated below:
Booking a ride#
Request size: Assume the request size is about 800 bytes, which includes vehicle type, source, destination addresses, etc. The total request size, including headers, is approximately 1.5 KB.
Response size: The response to a
POSTrequest for booking a ride is approximately 1 KB.
We'll only consider the request size as per our convention because the response is a standard 1 KB, and only the request size affects the response time in the case of POST. The overall response time can be calculated as follows:
Response Time Calculator for Booking a Ride
| Enter the size in KBs | 1.5 | KB |
| Minimum latency | f382.625 | ms |
| Maximum latency | f463.625 | ms |
| Minimum response time | f386.625 | ms |
| Maximum response time | f467.625 | ms |
Assuming the request size is 1.5 KBs:
Similarly, the response time is calculated as:
Get rides history#
Request size: The request size is approximately 1 KB for a
GETmethod because the request body is empty.Response size: Let's assume we have a size of approximately 1.5 KBs for each ride record. Suppose each request returns eight records then the response size of this
GETrequest would be approximately 13 KBs (including 1 KB for the header).
Since this is a standard GET request, we’ll only consider response size, which affects response time. Here’s the response time calculation:
Response Time Calculator for Get Rides History
| Enter the size in KBs | 13 | KB |
| Minimum latency | f195.7 | ms |
| Maximum latency | f276.7 | ms |
| Minimum response time | f199.7 | ms |
| Maximum response time | f280.7 | ms |
Assuming the response size is 13 KBs, then the latency is calculated by:
Similarly, the response time is calculated using the following equation:
Now, for minimum response time, we use minimum values of base time and processing time:
Now, for maximum response time, we use maximum values of base time and processing time:
A summary of the overall latency budget for GET and POST requests of Uber is shown in the illustration below:
While the response time of a POST request may seem high at first glance, the response time of services like Uber can tolerate up to one second of delay. Therefore, we can consider the estimates above to be as good numbers.
Note: For simplicity, we have not included the time spent interacting with supporting services in the response times above. Although, the time spent on service-to-service interaction is much less than the time spent communicating with end users, it adds delay and leads to increased response times.
Summary#
In this chapter, we learned about designing an efficient API for a transport service like Uber. We discussed the key design factors and discussed the decisions we made. We further provided the request and response format of the messages exchanged with endpoints that play an important role in the overall flow of the Uber service. Finally, we estimated response times to achieve near-real-time communication for the Uber service.
API Model for Uber Service
Requirements of the CamelCamelCamel API